4. Preparing systems for constant-pH simulation

Several steps are necessary to prepare a simulation system for simulation using protons. Most of protons relies on the OpenMM API for Amber files. In order to run constant-pH simulations, the protons package requires input files generated by several programs from the Ambertools suite.

These instructions assume that you have a .pdb file of your system that has all necessary components to start an MD simulation. If you need help adding missing residues, atoms, or other features to your system, consider reading the OpenMM Modeller documentation, or the Amber manual.

Using the examples shown below, we will demonstrate how to generate input for an implicit solvent constant-pH simulation, including

  • An amber topology file (.prmtop), which contains the parameters and topology of the system.
  • A coordinate file (.inpcrd), which contains the 3-dimensional coordinates of your system.
  • A protonation state file (.cpin), which enumerate protonation states of amino acids.

4.1. Obtaining and using Ambertools

Ambertools provides a series of command-line utilities that aid in setting up simulation systems. We will be using these for the manipulation of .pdb and .mol2 files, and for generating parameters for small molecules using the general Amber force field (GAFF). The instructions below have been tested with Ambertools16.

Ambertools16 can be obtained from the Amber website.

For any instructions relating to Ambertools programs, we refer to the Amber manual. This includes instructions on how to install Ambertools.

After following the installation procedure, please make sure you can access the following programs on your command-line:

  • tleap,
  • cpinutil.py,

and if you’re planning to use small molecules:

  • antechamber,
  • parmchk.

We will be using these to prepare systems for constant-pH simulation in OpenMM.

4.2. Fixing residue names and atoms

Several modifications will need to be made to your input file, in order to rename residues, and prevent issues with preexisting hydrogen atoms.

4.2.1. Renaming residues

One of the first steps of preparing an input file for constant-pH simulation is to rename all residues in the pdb file to their most protonated form.

The shortlist for PDB residues that our code supports is as follows:
  • Histidine (HIP)
  • Aspartic acid (AS4)
  • Glutamic acid (GL4)
  • Cysteine (CYS)
  • Lysine (LYS)
  • Arginine (ARG)

Note

You may also need to rename any cysteine residues involved in disulfide bonds from CYS CYX, for further processing steps.

4.2.2. Treating hydrogens

To properly add all necessary hydrogens to the system, it is easiest to first delete all current hydrogens in the system. Otherwise, improperly named hydrogens may cause issues when defining the topology of the system based on the .pdb file. Missing hydrogens will be added back in the next preparation step, so it is safe to delete them at this stage.

4.2.3. An example bash script for processing .pdb files

Calling the following shell script on your pdb file should help you prepare your .pdb file for simulation. It replaces the names of residues in-place in the file, and it removes hydrogens from your file.

pdbfile="input.pdb"

# Rename to constph residues
sed -i 's/HIE/HIP/g' ${pdbfile}
sed -i 's/HID/HIP/g' ${pdbfile}
sed -i 's/HIS/HIP/g' ${pdbfile}
sed -i 's/ASP/AS4/g' ${pdbfile}
sed -i 's/GLU/GL4/g' ${pdbfile}

# Remove hydrogens (print lines with atom names not starting with H)
awk '$3 !~ /^ *H/' ${pdbfile} > tmp && mv tmp ${pdbfile}

Tools like sed and awk can be very useful for manipulating text files quickly, though I recommend that you verify the output manually afterwards.

4.3. Writing out coordinate and parameter files using Leap

The next step in system preparation is generating the Amber coordinate (.inpcrd), and the topology and parameters files (.prmtop). This can be done using the command-line utility tleap, which is available as part of ambertools.

These instructions will suffice if your system does not include any small molecules. If your system contains a ligand, please see the next section.

You can use tleap interactively on the command line. Just type

tleap

And you will see output similar to this

-bash-4.1$ tleap
-I: Adding /home/user/bin/../dat/leap/prep to search path.
-I: Adding /home/user/bin/../dat/leap/lib to search path.
-I: Adding /home/user/bin/../dat/leap/parm to search path.
-I: Adding/home/user/bin/../dat/leap/cmd to search path.

Welcome to LEaP!
(no leaprc in search path)
> █

You can start typing your commands line by line. Alternatively, you can store commands in a text file, and then use

tleap -f tleap.txt

and tleap will run the specified commands automatically. Tleap output can be rather verbose. It is recommended to write the output to file, so you can verify that all steps were executed correctly.

Here is a bash example:

tleap -f tleap.in >> tleap.out 2>&1

You can rename the .out file to anything of your choosing.

4.3.1. Tleap commands

The following sequence of commands should do for a simple pdb file containing one protein structure.

# Load constant ph parameters
source leaprc.constph

# Load the PDB file, rename it to your input file
protein = loadPDB input.pdb

# Validate the input
check protein

# Calculate the total charge, for logging purposes
charge protein

# Write parameters.
saveAmberParm protein complex.prmtop complex.inpcrd

# Write PDB files, optional
savepdb protein complex.pdb

# Exit, make sure not to forget this part
quit

If you need to perform other steps to prepare your system for simulation, please read the Amber manual.

4.3.2. Validating tleap results

If you run interactively, tleap should provide error messages on screen. The output can be rather verbose, so make sure that your terminal is configured to scroll back far.

Alternatively, if you run using an input file, make sure that tleap ran successfully.

I often write output to a log file, and check the log file for errors. Here is a short bash snippet that does the trick.

tleap -f tleap.in >> tleap.out 2>&1

# There might be other error clues. This method isn't fail safe.
tleap_result=$(grep "usage" tleap.out || grep -i "error" tleap.out)

# As long as the grep results are empty
 if [ -z "$tleap_result" ]
 then
   echo -e "\e[32mTleap looks successful. Still, act cautious. She's a slippery one.\e[39m"
 else
   echo -e "\e[31mCaught an error in Tleap. Tough luck, buddy.\e[39m"
   echo $tleap_result
 fi

This procedure generates three different files:

  • complex.prmtop, an Amber topology file which contains the topology and parameters of the protein system.
  • complex.inpcrd, a file containing the coordinates of all atoms in the system
  • complex.pdb, this file is optional. You can use a pdb file in software such as PyMOL, to verify that the prepared structure doesn’t contain mistakes.

You will be needing these to run your OpenMM script.

4.4. Including ligands in your system

Warning

  • Ligand support is a work in progress. We’ve experienced system instability with small molecules in implicit solvent simulations.

If you have a ligand, you will have to prepare your ligand using antechamber, and parmchk. This is used to generate two files

  • ligand.gaff.mol2, a mol2 file with GAFF atom types.
  • ligand.gaff.frcmod, an frcmod file with GAFF parameters for the ligand.

Here is an example of how to run antechamber and parmchk.

antechamber -i ligand.mol2 -fi mol2 -o ligand.gaff.mol2 -fo mol2
parmchk -i ligand.gaff.mol2 -o ligand.gaff.frcmod  -f mol2

You may wish to explore the advanced options of antechamber if you need to generate charges for your ligands. If you want to generate charges in another program, using a .mol2 file should allow you to maintain those charges. Now that you’ve generated parameters for your ligand, these files then need to be added to your leap setup.

Here is an example leap script.

# Load constant ph parameters
source leaprc.constph

# Gaff params
source leaprc.gaff

# Load ligand parameters
ligand = loadMol2 ligand.gaff.mol2
loadAmberParams ligand.gaff.frcmod

# Load the PDBs
protein = loadPDB protein.pdb

# Combine into one complex
complex = combine { protein ligand }

# Validate the input
check complex

# Calculate the total charge, for logging purposes
charge complex

# Write parameters.
saveAmberParm  complex  complex.prmtop complex.inpcrd

# Write PDB files
savepdb protein complex.pdb

# Exit, make sure not to forget this part
quit

Todo

  • In the current version of the code, ligands can not be treated using constant-pH methodologies.

4.5. Generating parameters for amino acid protonation states

The last step to generate input for the constant-pH simulation is to generate a .cpin file for your protein. This file contains the parameters for the different protonation states of the amino acids in the system.

A .cpin file can be generated by cpinutil.py, which is also distributed as part of Ambertools.

cpinutil.py -resnames HIP GL4 AS4 TYR LYS CYS -p complex.prmtop -o complex.cpin